Gradient Boosting Algorithms Algorithm

Gradient Boosting Algorithms (GBA) is a powerful ensemble learning technique that builds a series of decision trees in a sequential manner, with each tree aiming to correct the errors made by its predecessor. At its core, GBA combines the ideas of boosting, which is a method for improving the performance of a single weak learner, and gradient descent, an optimization technique for minimizing a loss function. The algorithm trains a number of weak learners, typically shallow decision trees, on the given data, and then combines their outputs to produce a more accurate and robust prediction model. The key idea behind gradient boosting is that by focusing on the errors of the previous tree, the model can iteratively improve its performance. The process begins by fitting an initial model to the data and computing the residuals, which represent the difference between the true target values and the predictions made by the model. A new decision tree is then built to predict these residuals, and its output is added to the initial model's prediction, weighted by a learning rate. This process is repeated for a predefined number of iterations or until a stopping criterion is met, such as no significant improvement in performance or reaching a maximum number of trees. The final model is formed by the weighted sum of all the decision trees, where the weights are determined by the learning rate and the contribution of each tree to reducing the residuals. By aggregating the output of multiple weak learners in this manner, gradient boosting algorithms are able to effectively capture complex patterns and relationships in the data, resulting in highly accurate and generalizable models.
# GBM
library(caret)
x <- cbind(x_train,y_train)
# Fitting model
fitControl <- trainControl( method = "repeatedcv", number = 4, repeats = 4)
fit <- train(y ~ ., data = x, method = "gbm", trControl = fitControl,verbose = FALSE)
predicted= predict(fit,x_test,type= "prob")[,2] 



# XGBoost
require(caret)
x <- cbind(x_train,y_train)
# Fitting model
TrainControl <- trainControl( method = "repeatedcv", number = 10, repeats = 4)
model<- train(y ~ ., data = x, method = "xgbLinear", trControl = TrainControl,verbose = FALSE)
# OR 
model<- train(y ~ ., data = x, method = "xgbTree", trControl = TrainControl,verbose = FALSE)
predicted <- predict(model, x_test)



# LightGBM
library(RLightGBM)
data(example.binary)
# Parameters
num_iterations <- 100
config <- list(objective = "binary",  metric="binary_logloss,auc", learning_rate = 0.1, num_leaves = 63, tree_learner = "serial", feature_fraction = 0.8, bagging_freq = 5, bagging_fraction = 0.8, min_data_in_leaf = 50, min_sum_hessian_in_leaf = 5.0)
# Create data handle and booster
handle.data <- lgbm.data.create(x)
lgbm.data.setField(handle.data, "label", y)
handle.booster <- lgbm.booster.create(handle.data, lapply(config, as.character))
# Train for num_iterations iterations and eval every 5 steps
lgbm.booster.train(handle.booster, num_iterations, 5)
# Predict
pred <- lgbm.booster.predict(handle.booster, x.test)
# Test accuracy
sum(y.test == (y.pred > 0.5)) / length(y.test)
# Save model (can be loaded again via lgbm.booster.load(filename))
lgbm.booster.save(handle.booster, filename = "/tmp/model.txt")



# Catboost
set.seed(1)

require(titanic)

require(caret)

require(catboost)

tt <- titanic::titanic_train[complete.cases(titanic::titanic_train),]

data <- as.data.frame(as.matrix(tt), stringsAsFactors = TRUE)

drop_columns = c("PassengerId", "Survived", "Name", "Ticket", "Cabin")

x <- data[,!(names(data) %in% drop_columns)]y <- data[,c("Survived")]

fit_control <- trainControl(method = "cv", number = 4,classProbs = TRUE)

grid <- expand.grid(depth = c(4, 6, 8),learning_rate = 0.1,iterations = 100, l2_leaf_reg = 1e-3,            rsm = 0.95, border_count = 64)

report <- train(x, as.factor(make.names(y)),method = catboost.caret,verbose = TRUE, preProc = NULL,tuneGrid = grid, trControl = fit_control)

print(report)

importance <- varImp(report, scale = FALSE)

print(importance)

LANGUAGE:

DARK MODE: